11 research outputs found
Optimal Deceptive and Reference Policies for Supervisory Control
The use of deceptive strategies is important for an agent that attempts not
to reveal his intentions in an adversarial environment. We consider a setting
in which a supervisor provides a reference policy and expects an agent to
follow the reference policy and perform a task. The agent may instead follow a
different, deceptive policy to achieve a different task. We model the
environment and the behavior of the agent with a Markov decision process,
represent the tasks of the agent and the supervisor with linear temporal logic
formulae, and study the synthesis of optimal deceptive policies for such
agents. We also study the synthesis of optimal reference policies that prevents
deceptive strategies of the agent and achieves the supervisor's task with high
probability. We show that the synthesis of deceptive policies has a convex
optimization problem formulation, while the synthesis of reference policies
requires solving a nonconvex optimization problem.Comment: 20 page
Smooth Convex Optimization using Sub-Zeroth-Order Oracles
We consider the problem of minimizing a smooth, Lipschitz, convex function
over a compact, convex set using sub-zeroth-order oracles: an oracle that
outputs the sign of the directional derivative for a given point and a given
direction, an oracle that compares the function values for a given pair of
points, and an oracle that outputs a noisy function value for a given point. We
show that the sample complexity of optimization using these oracles is
polynomial in the relevant parameters. The optimization algorithm that we
provide for the comparator oracle is the first algorithm with a known rate of
convergence that is polynomial in the number of dimensions. We also give an
algorithm for the noisy-value oracle that incurs a regret of
(ignoring the other factors and
logarithmic dependencies) where is the number of dimensions and is the
number of queries.Comment: Extended version of the accepted paper in the 35th AAAI Conference on
Artificial Intelligence 2021. 19 pages including supplementary materia
Alternating Direction Method of Multipliers for Decomposable Saddle-Point Problems
Saddle-point problems appear in various settings including machine learning,
zero-sum stochastic games, and regression problems. We consider decomposable
saddle-point problems and study an extension of the alternating direction
method of multipliers to such saddle-point problems. Instead of solving the
original saddle-point problem directly, this algorithm solves smaller
saddle-point problems by exploiting the decomposable structure. We show the
convergence of this algorithm for convex-concave saddle-point problems under a
mild assumption. We also provide a sufficient condition for which the
assumption holds. We demonstrate the convergence properties of the saddle-point
alternating direction method of multipliers with numerical examples on a power
allocation problem in communication channels and a network routing problem with
adversarial costs.Comment: Accepted to 58th Annual Allerton Conference on Communication,
Control, and Computin
Differential Privacy in Cooperative Multiagent Planning
Privacy-aware multiagent systems must protect agents' sensitive data while
simultaneously ensuring that agents accomplish their shared objectives. Towards
this goal, we propose a framework to privatize inter-agent communications in
cooperative multiagent decision-making problems. We study sequential
decision-making problems formulated as cooperative Markov games with
reach-avoid objectives. We apply a differential privacy mechanism to privatize
agents' communicated symbolic state trajectories, and then we analyze tradeoffs
between the strength of privacy and the team's performance. For a given level
of privacy, this tradeoff is shown to depend critically upon the total
correlation among agents' state-action processes. We synthesize policies that
are robust to privacy by reducing the value of the total correlation. Numerical
experiments demonstrate that the team's performance under these policies
decreases by only 3 percent when comparing private versus non-private
implementations of communication. By contrast, the team's performance decreases
by roughly 86 percent when using baseline policies that ignore total
correlation and only optimize team performance
Formal Methods for Autonomous Systems
Formal methods refer to rigorous, mathematical approaches to system
development and have played a key role in establishing the correctness of
safety-critical systems. The main building blocks of formal methods are models
and specifications, which are analogous to behaviors and requirements in system
design and give us the means to verify and synthesize system behaviors with
formal guarantees.
This monograph provides a survey of the current state of the art on
applications of formal methods in the autonomous systems domain. We consider
correct-by-construction synthesis under various formulations, including closed
systems, reactive, and probabilistic settings. Beyond synthesizing systems in
known environments, we address the concept of uncertainty and bound the
behavior of systems that employ learning using formal methods. Further, we
examine the synthesis of systems with monitoring, a mitigation technique for
ensuring that once a system deviates from expected behavior, it knows a way of
returning to normalcy. We also show how to overcome some limitations of formal
methods themselves with learning. We conclude with future directions for formal
methods in reinforcement learning, uncertainty, privacy, explainability of
formal methods, and regulation and certification
On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples
Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning
Influence of Menstrual Cycle on P Wave Dispersion
Female gender is an independent risk factor for some types of arrhythmias. We sought to determine whether the menstrual cycle affects P wave dispersion, which is a predictor of atrial fibrillation. The study population consisted of 59 women in follicular phase (mean age, 29.3 +/- 7.7 years) (group F) and 53 women in luteal phase (mean age, 28.1 +/- 6.8 years) (group L). The ECGs of 35 patients (mean age, 26.4 +/- 4.5) were obtained in both follicular and luteal phase. Both groups underwent a standard 12-lead surface electrocardiogram recorded at 50 mm/s. Maximal (Pmax) and minimal P wave durations (Pmin) were measured. P wave dispersion (PD) was defined as the difference between Pmax and Pmin. PD was significantly higher in group L than group F (46.6 +/- 18.5 versus 40.1 +/- 12.7; P < 0.05). Pmin was significantly lower in group L than group F (51.6 +/- 12.1 versus 59.1 +/- 12.1; P = 0.002). When we compared ECGs in different phases of the 35 patients, PD was significantly higher in luteal phase than follicular phase (53.2 +/- 12.3 versus 42.8 +/- 10.2; P < 0.05). Pmin was significantly lower in luteal phase than follicular phase (47.6 +/- 6.6 versus 56 +/- 10.1; P = 0.05). We detected a significant correlation between the day of the menses and PD (r = 0.27; P < 0.05). PD was increased in luteal phase compared to follicular phase, and this difference was more prominent as the days of the cycle progressed. (Int Heart J 2011; 52: 23-26
The Value of P wave dispersion in predicting reperfusion and infarct related artery patency in acute anterior myocardial infarction
Purpose: The aim of this study is to investigate whether P wave dispersion (PWD), measured before, during and after fibrinolytic therapy (FT,) is able to predict successful reperfusion and infarct related artery (IRA) patency in patients with acute anterior MI who received FT